Formant diphone parameter extraction utilising a labelled single-speaker database
نویسنده
چکیده
This paper examines a method for formant parameter extraction from a labeled single speaker database for use in a formantparameter diphone-concatenation speech synthesis system. This procedure commences with an initial formant analysis of the labelled database, which is then used to obtain formant (F1-F5) probability spaces for each phoneme. These probability spaces guide a more careful speaker-specific extraction of formant frequencies. An analysis-by-synthesis procedure is then used to provide best-matching formant intensity and bandwidth parameters. The great majority of the parameters so extracted produce speech which is highly intelligible and which has a voice quality close to the original speaker.
منابع مشابه
Expressing vocal effort in con
A new diphone database with a full diphone set for each of three levels of vocal effort is presented. A theoretical motivation is given why this kind of database will be useful for emotional speech synthesis. Two hypotheses are verified in perception experiments: (I) The three diphone sets are perceived as belonging to the same speaker; (II) The vocal effort intended during database recordings ...
متن کاملOn the reduction of concatenation artefacts in diphone synthesis
One well-known problem with diphone concatenation is the occurrence of audible discontinuities at diphone boundaries, which are most prominent in vowels and semi-vowels. Significant formant jumps at certain boundaries suggest that the problem is of a spectral nature. We have examined this hypothesis by correlating the results of a listening experiment with spectral distances measured across dip...
متن کاملImproving Speaker Identification Performance by Combining Vocal Tract Features
This paper proposes fusion and addition techniques of vocal tract features such as Mel Frequency Cepstral Coefficients (MFCC) and Dynamic Mel Frequency Cepstral Coefficients (DMFCC) in speaker identification. Feature extraction plays an important role as a front end processing block in Speaker Identification (SI) process. Mel frequency features are used to extract the spectral characteristics o...
متن کاملOptimizing Vowel Formant Measurements in Four Acoustic Analysis Systems for Diverse Speaker Groups.
PURPOSE This study systematically assessed the effects of select linear predictive coding (LPC) analysis parameter manipulations on vowel formant measurements for diverse speaker groups using 4 trademarked Speech Acoustic Analysis Software Packages (SAASPs): CSL, Praat, TF32, and WaveSurfer. METHOD Productions of 4 words containing the corner vowels were recorded from 4 speaker groups with ty...
متن کاملSpeaker conversion in ARX-based source-
A speaker conversion framework for formant synthesis is proposed. With this framework, given a small set of a target speaker’s utterances, segmental features of an original speech can be converted to those of the given speaker. Unlike other speaker conversion frameworks, further voice quality modification can also be applied to the converted speech with conventional formant modification techniq...
متن کامل